NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

GSPLIT: SCALING GRAPH NEURAL NETWORK TRAINING ON LARGE GRAPHS VIA PROBABILISTIC SPLITTING

Polisetty, Sandeep; Liu, Juelin; Falus, Jacob; Fung, Yiren; Lim, Seung Hwan; Guan, Hui; Serafini, Marco (May 2025, Proceedings of the 8 th MLSys Conference)

Free, publicly-accessible full text available May 17, 2026
GSplit: Scaling Graph Neural Network Training on Large Graphs via Probabilistic Splitting

Polisetty, Sandeep; Liu, Juelin; Falus, Jacob; Fung, Yiren; Lim, Seung Hwan; Guan, Hui; Serafini, Marco (May 2025, Proceedings of the 8 th MLSys Conference)

Free, publicly-accessible full text available May 17, 2026
Graph Neural Network Training Systems: A Performance Comparison of Full-Graph and Mini-Batch

https://doi.org/10.14778/3717755.3717776

Bajaj, Saurabh; Son, Hojae; Liu, Juelin; Guan, Hui; Serafini, Marco (December 2024, Proceedings of the VLDB Endowment)

Graph Neural Networks (GNNs) have gained significant attention in recent years due to their ability to learn representations of graph-structured data. Two common methods for training GNNs are mini-batch training and full-graph training. Since these two methods require different training pipelines and systems optimizations, two separate classes of GNN training systems emerged, each tailored for one method. Works that introduce systems belonging to a particular category predominantly compare them with other systems within the same category, offering limited or no comparison with systems from the other category. Some prior work also justifies its focus on one specific training method by arguing that it achieves higher accuracy than the alternative. The literature, however, has incomplete and contradictory evidence in this regard. In this paper, we provide a comprehensive empirical comparison of representative full-graph and mini-batch GNN training systems. We find that the mini-batch training systems consistently converge faster than the full-graph training ones across multiple datasets, GNN models, and system configurations. We also find that minibatch training techniques converge to similar to or often higher accuracy values than full-graph training ones, showing that minibatch sampling is not necessarily detrimental to accuracy. Our work highlights the importance of comparing systems across different classes, using time-to-accuracy rather than epoch time for performance comparison, and selecting appropriate hyperparameters for each training method separately.
more » « less
Full Text Available
FlexpushdownDB: rethinking computation pushdown for cloud OLAP DBMSs

https://doi.org/10.1007/s00778-024-00867-8

Yang, Yifei; Yu, Xiangyao; Serafini, Marco; Aboulnaga, Ashraf; Stonebraker, Michael (September 2024, The VLDB Journal)

Modern cloud-native OLAP databases adopt a storage-disaggregation architecture that separates the management of compu- tation and storage. A major bottleneck in such an architecture is the network connecting the computation and storage layers. Computation pushdown is a promising solution to tackle this issue, which offloads some computation tasks to the storage layer to reduce network traffic. This paper presents FlexPushdownDB (FPDB), where we revisit the design of computation pushdown in a storage-disaggregation architecture, and then introduce several optimizations to further accelerate query pro- cessing. First, FPDB supports hybrid query execution, which combines local computation on cached data and computation pushdown to cloud storage at a fine granularity. Within the cache, FPDB uses a novel Weighted-LFU cache replacement policy that takes into account the cost of pushdown computation. Second, we design adaptive pushdown as a new mecha- nism to avoid throttling the storage-layer computation during pushdown, which pushes the request back to the computation layer at runtime if the storage-layer computational resource is insufficient. Finally, we derive a general principle to identify pushdown-amenable computational tasks, by summarizing common patterns of pushdown capabilities in existing systems, and further propose two new pushdown operators, namely, selection bitmap and distributed data shuffle. Evaluation on SSB and TPC-H shows each optimization can improve the performance by 2.2×, 1.9×, and 3× respectively.
more » « less
Full Text Available
GMorph: Accelerating Multi-DNN Inference via Model Fusion

Yang, Qizheng; Yang, Tianyi; Xiang, Mingcan; Zhang, Lijun; Wang, Haoliang; Serafini, Marco; Guan, Hui (April 2024, ACM)

AI-powered applications often involve multiple deep neural network (DNN)-based prediction tasks to support application level functionalities. However, executing multi-DNNs can be challenging due to the high resource demands and computation costs that increase linearly with the number of DNNs. Multi-task learning (MTL) addresses this problem by designing a multi-task model that shares parameters across tasks based on a single backbone DNN. This paper explores an alternative approach called model fusion: rather than training a single multi-task model from scratch as MTL does, model fusion fuses multiple task-specific DNNs that are pre-trained separately and can have heterogeneous architectures into a single multi-task model. We materialize model fusion in a software framework called GMorph to accelerate multi- DNN inference while maintaining task accuracy. GMorph features three main technical contributions: graph mutations to fuse multi-DNNs into resource-efficient multi-task models, search-space sampling algorithms, and predictive filtering to reduce the high search costs. Our experiments show that GMorph can outperform MTL baselines and reduce the inference latency of multi-DNNs by 1.1-3X while meeting the target task accuracy.
more » « less
Full Text Available
GMorph: Accelerating Multi-DNN Inference via Model Fusion

Yang, Qizheng; Yang, Tianyi; Xiang, Mingcan; Zhang, Lijun; Wang, Haoliang; Serafini, Marco; Guan, Hui (April 2024, ACM EuroSys'24)

AI-powered applications often involve multiple deep neural network (DNN)-based prediction tasks to support application level functionalities. However, executing multi-DNNs can be challenging due to the high resource demands and computation costs that increase linearly with the number of DNNs. Multi-task learning (MTL) addresses this problem by designing a multi-task model that shares parameters across tasks based on a single backbone DNN. This paper explores an alternative approach called model fusion: rather than training a single multi-task model from scratch as MTL does, model fusion fuses multiple task-specific DNNs that are pre-trained separately and can have heterogeneous architectures into a single multi-task model. We materialize model fusion in a software framework called GMorph to accelerate multi- DNN inference while maintaining task accuracy. GMorph features three main technical contributions: graph mutations to fuse multi-DNNs into resource-efficient multi-task models, search-space sampling algorithms, and predictive filtering to reduce the high search costs. Our experiments show that GMorph can outperform MTL baselines and reduce the inference latency of multi-DNNs by 1.1-3X while meeting the target task accuracy.
more » « less
Full Text Available
GraphMini: Accelerating Graph Pattern Matching Using Auxiliary Graphs

https://doi.org/10.1109/PACT58117.2023.00026

Liu, Juelin; Polisetty, Sandeep; Guan, Hui; Serafini, Marco (October 2023, IEEE)

Graph pattern matching is a fundamental problem encountered by many common graph mining tasks and the basic building block of several graph mining systems. This paper explores for the first time how to proactively prune graphs to speed up graph pattern matching by leveraging the structure of the query pattern and the input graph. We propose building auxiliary graphs, which are different pruned versions of the graph, during query execution. This requires careful balancing between the upfront cost of building and managing auxiliary graphs and the gains of faster set operations. To this end, we propose GraphMini, a new system that uses query compilation and a new cost model to minimize the cost of building and maintaining auxiliary graphs and maximize gains. Our evaluation shows that using GraphMini can achieve one order of magnitude speedup compared to state-of-the-art subgraph enumeration systems on commonly used benchmarks.
more » « less
Full Text Available
Accelerating Graph Sampling for Graph Machine Learning using GPUs

https://doi.org/10.1145/3447786.3456244

Jangda, Abhinav; Polisetty, Sandeep; Guha, Arjun; Serafini, Marco (April 2021, European Conference on Computer Systems (EuroSys))
null (Ed.)
Representation learning algorithms automatically learn the features of data. Several representation learning algorithms for graph data, such as DeepWalk, node2vec, and GraphSAGE, sample the graph to produce mini-batches that are suitable for training a DNN. However, sampling time can be a significant fraction of training time, and existing systems do not efficiently parallelize sampling. Sampling is an "embarrassingly parallel" problem and may appear to lend itself to GPU acceleration, but the irregularity of graphs makes it hard to use GPU resources effectively. This paper presents NextDoor, a system designed to effectively perform graph sampling on GPUs. NextDoor employs a new approach to graph sampling that we call transit-parallelism, which allows load balancing and caching of edges. NextDoor provides end-users with a high-level abstraction for writing a variety of graph sampling algorithms. We implement several graph sampling applications, and show that NextDoor runs them orders of magnitude faster than existing systems.
more » « less
Full Text Available
FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS

https://doi.org/10.14778/3476249.3476265

Yang, Yifei; Youill, Matt; Woicik, Matthew; Liu, Yizhou; Yu, Xiangyao; Serafini, Marco; Aboulnaga, Ashraf; Stonebraker, Michael (July 2021, Proceedings of the VLDB Endowment)
null (Ed.)
Modern cloud databases adopt a storage-disaggregation architecture that separates the management of computation and storage. A major bottleneck in such an architecture is the network connecting the computation and storage layers. Two solutions have been explored to mitigate the bottleneck: caching and computation pushdown. While both techniques can significantly reduce network traffic, existing DBMSs consider them as orthogonal techniques and support only one or the other, leaving potential performance benefits unexploited. In this paper we present FlexPushdownDB (FPDB), an OLAP cloud DBMS prototype that supports fine-grained hybrid query execution to combine the benefits of caching and computation pushdown in a storage-disaggregation architecture. We build a hybrid query executor based on a new concept called separable operators to combine the data from the cache and results from the pushdown processing. We also propose a novel Weighted-LFU cache replacement policy that takes into account the cost of pushdown computation. Our experimental evaluation on the Star Schema Benchmark shows that the hybrid execution outperforms both the conventional caching- only architecture and pushdown-only architecture by 2.2×. In the hybrid architecture, our experiments show that Weighted-LFU can outperform the baseline LFU by 37%.
more » « less
Full Text Available

Search for: All records